prov/efa: Introduce PPS enhancement interface#12225
Conversation
|
@mrgolin can u take a look |
| } | ||
|
|
||
|
|
||
| if (efa_env.enable_high_pps && (flags & FI_EFA_WR_HIGH_PPS)) |
There was a problem hiding this comment.
the efa_env.enable_high_pps will be dropped eventually ?
There was a problem hiding this comment.
Yes, it is a temporary to gate this feature before the firmware deployment is fully completed
0fda097 to
94ff076
Compare
|
|
||
| @pytest.mark.fabric(params=["efa", "efa-direct"]) | ||
| @pytest.mark.message_sizes(default_efa=PERF_SIZES, default_efa_direct=DIRECT_RMA_SIZES, | ||
| pr_ci_efa=PERF_PR_CI, pr_ci_efa_direct=DIRECT_RMA_SIZES) |
There was a problem hiding this comment.
We don't need pr_ci_efa=PERF_PR_CI, pr_ci_efa_direct=DIRECT_RMA_SIZES if not @pytest.mark.pr_ci
There was a problem hiding this comment.
I see all the tests above have such decorator, even if most of them don't have pytest.mark.pr_ci. I do want it to get run within PR CI, what is the suggested decorators exactly? @charlesstoll
|
|
||
|
|
||
| /* | ||
| * EFA provider-specific operation flags (bits 60-63). |
There was a problem hiding this comment.
nit: a sentence directing to flags 0-59 could be useful
| } | ||
|
|
||
|
|
||
| if (efa_env.enable_processing_hints) |
There was a problem hiding this comment.
would FI_EFA_WR_HIGH_PPS ever be set if enable_processing_hints is not?
maybe just check for the first here?
There was a problem hiding this comment.
FI_EFA_WR_HIGH_PPS should always be exposed to applications. So we need the env check as a runtime gate. And I would add a comment that such env is only a temporary gate before the fw deployment and we should remove it after the deployment
There was a problem hiding this comment.
I made the environment variable as a gate as a separate commit f8124ce, and it should be reverted finally after the deployment.
| { | ||
| if (flags & FI_EFA_WR_HIGH_PPS) { | ||
| uint8_t wqe_hints = 0; | ||
| wqe_hints |= EFA_IO_PROCESSING_HINT_BURST_PPS_SENSITIVE; |
There was a problem hiding this comment.
this seems like you're preparing this function for more potential hints in the future, which I'm not sure we need to at this point. But if you do, then wqe_hints declaration and EFA_SET should be outside the HIGH_PPS if blocjk
There was a problem hiding this comment.
My old implementation was just calling an EFA_SET(...., 1) when (flags & FI_EFA_WR_HIGH_PPS). Then I saw rdma-core's implementation is assuming that can be extended so I followed.
I also thought about making that wqe_hint and EFA_SET outside the if block but I think that can cause them always called for all flags which is not necessary
There was a problem hiding this comment.
Updated the code to only call the whole processing hint function when (flags & FI_EFA_WR_HIGH_PPS)
| ibv_wr_set_ud_addr(qp->ibv_qp_ex, ah->ibv_ah, qpn, qkey); | ||
|
|
||
| #if HAVE_EFADV_WR_PROCESSING_HINTS | ||
| if (efa_env.enable_processing_hints && (flags & FI_EFA_WR_HIGH_PPS)) |
There was a problem hiding this comment.
how can enable_processing_hints be set if HAVE_EFADV_WR_PROCESSING_HINTS is not defined? maybe the macro check is redundant?
There was a problem hiding this comment.
HAVE_EFADV_WR_PROCESSING_HINTS only protects rdma-core interface, but we don't need a new rdma-core interface to enable processing hints, for data path direct code path. So these two are orthogonal
There was a problem hiding this comment.
Same above, I made the environment variable as a gate as a separate commit f8124ce, and it should be removed finally after the deployment.
18de125 to
8f410a3
Compare
| * such as the FI_EFA_WR_HIGH_PPS flag. It currently supports write, | ||
| * writedata, and read operations. | ||
| * | ||
| * Unlike fi_rma_bw, this test uses a nonblocking benchmark loop that |
There was a problem hiding this comment.
Why not make this test fi_rma_bw_non_blocking?
@j-xiong do you have an opinion? Will this test be useful for other providers?
There was a problem hiding this comment.
Yes, I think it is useful in general.
There was a problem hiding this comment.
This test was implementing some efa specific flags, I can do some refactor to make it ported to common code for the non-blocking pattern. But it will require more work
Define FI_EFA_WR_HIGH_PPS as a provider-specific operation flag (bit 60) in fi_ext_efa.h. Applications pass this flag in fi_writemsg() to hint the device to optimize for higher message rate on RDMA write operations. Signed-off-by: Shi Jin <sjina@amazon.com>
In efa_data_path_direct_post_write(), set the processing hint bit in the TX WQE metadata ctrl3 field when both efa_env.enable_high_pps and FI_EFA_WR_HIGH_PPS are set. Add efa_send_wr_set_processing_hint_high_pps() helper and the EFA_IO_TX_META_DESC_PROCESSING_HINT_MASK definition for the new ctrl3 field in efa_io_defs.h. Signed-off-by: Shi Jin <sjina@amazon.com>
Implement the rdma-core data path integration for the high PPS processing hint on RDMA write operations. During QP creation, set EFADV_WR_EX_WITH_PROCESSING_HINT in wr_flags to enable the WQE-level hint setter. In efa_ibv_post_write(), call efadv_wr_set_processing_hint() when both the feature flag and FI_EFA_WR_HIGH_PPS are set. Add configure checks for efadv_qp_from_ibv_qp_ex, wr_flags, and EFADV_WR_PROCESSING_HINT_BURST_PPS_SENSITIVE. All changes are guarded by HAVE_EFADV_WR_PROCESSING_HINT for backward compatibility with older rdma-core versions. Signed-off-by: Shi Jin <sjina@amazon.com>
Add fi_efa_rma_bw, an EFA-specific RMA bandwidth test that supports write and writedata operations with EFA-specific features such as the FI_EFA_WR_HIGH_PPS flag. Unlike fi_rma_bw, this test uses a nonblocking benchmark loop that interleaves posting and completion polling to keep the pipeline full, similar to the approach used by rdma-core/perftest. This avoids blocking at window boundaries and maximizes throughput. Signed-off-by: Shi Jin <sjina@amazon.com>
Disable shm because it is a efa-only test. Signed-off-by: Shi Jin <sjina@amazon.com>
Add enable_high_pps field to efa_env struct, gated by the undocumented FI_EFA_ENABLE_HIGH_PPS environment variable. This allows controlled rollout of PPS optimization before firmware deployment is complete. Signed-off-by: Shi Jin <sjina@amazon.com>
A series of commit that introduce an operation level interface to allow user enhance the packet per second (PPS) .